247 research outputs found

    Better assessment of physical function: item improvement is neglected but essential

    Get PDF
    INTRODUCTION: Physical function is a key component of patient-reported outcome (PRO) assessment in rheumatology. Modern psychometric methods, such as Item Response Theory (IRT) and Computerized Adaptive Testing, can materially improve measurement precision at the item level. We present the qualitative and quantitative item-evaluation process for developing the Patient Reported Outcomes Measurement Information System (PROMIS) Physical Function item bank. METHODS: The process was stepwise: we searched extensively to identify extant Physical Function items and then classified and selectively reduced the item pool. We evaluated retained items for content, clarity, relevance and comprehension, reading level, and translation ease by experts and patient surveys, focus groups, and cognitive interviews. We then assessed items by using classic test theory and IRT, used confirmatory factor analyses to estimate item parameters, and graded response modeling for parameter estimation. We retained the 20 Legacy (original) Health Assessment Questionnaire Disability Index (HAQ-DI) and the 10 SF-36\u27s PF-10 items for comparison. Subjects were from rheumatoid arthritis, osteoarthritis, and healthy aging cohorts (n = 1,100) and a national Internet sample of 21,133 subjects. RESULTS: We identified 1,860 items. After qualitative and quantitative evaluation, 124 newly developed PROMIS items composed the PROMIS item bank, which included revised Legacy items with good fit that met IRT model assumptions. Results showed that the clearest and best-understood items were simple, in the present tense, and straightforward. Basic tasks (like dressing) were more relevant and important versus complex ones (like dancing). Revised HAQ-DI and PF-10 items with five response options had higher item-information content than did comparable original Legacy items with fewer response options. IRT analyses showed that the Physical Function domain satisfied general criteria for unidimensionality with one-, two-, three-, and four-factor models having comparable model fits. Correlations between factors in the test data sets were \u3e 0.90. CONCLUSIONS: Item improvement must underlie attempts to improve outcome assessment. The clear, personally important and relevant, ability-framed items in the PROMIS Physical Function item bank perform well in PRO assessment. They will benefit from further study and application in a wider variety of rheumatic diseases in diverse clinical groups, including those at the extremes of physical functioning, and in different administration modes

    Development and Validation of the Computerised Adaptive Beat Alignment Test (CA-BAT)

    Get PDF
    Beat perception is increasingly being recognised as a fundamental musical ability. A number of psychometric instruments have been developed to assess this ability, but these tests do not take advantage of modern psychometric techniques, and rarely receive systematic validation. The present research addresses this gap in the literature by developing and validating a new test, the Computerised Adaptive Beat Alignment Test (CA-BAT), a variant of the Beat Alignment Test (BAT) that leverages recent advances in psychometric theory, including item response theory, adaptive testing, and automatic item generation. The test is constructed and validated in four empirical studies. The results support the reliability and validity of the CA-BAT for laboratory testing, but suggest that the test is not well-suited to online testing, owing to its reliance on fne perceptual discrimination

    On environment difficulty and discriminating power

    Full text link
    The final publication is available at Springer via http://dx.doi.org/10.1007/s10458-014-9257-1This paper presents a way to estimate the difficulty and discriminating power of any task instance. We focus on a very general setting for tasks: interactive (possibly multiagent) environments where an agent acts upon observations and rewards. Instead of analysing the complexity of the environment, the state space or the actions that are performed by the agent, we analyse the performance of a population of agent policies against the task, leading to a distribution that is examined in terms of policy complexity. This distribution is then sliced by the algorithmic complexity of the policy and analysed through several diagrams and indicators. The notion of environment response curve is also introduced, by inverting the performance results into an ability scale. We apply all these concepts, diagrams and indicators to two illustrative problems: a class of agent-populated elementary cellular automata, showing how the difficulty and discriminating power may vary for several environments, and a multiagent system, where agents can become predators or preys, and may need to coordinate. Finally, we discuss how these tools can be applied to characterise (interactive) tasks and (multi-agent) environments. These characterisations can then be used to get more insight about agent performance and to facilitate the development of adaptive tests for the evaluation of agent abilities.I thank the reviewers for their comments, especially those aiming at a clearer connection with the field of multi-agent systems and the suggestion of better approximations for the calculation of the response curves. The implementation of the elementary cellular automata used in the environments is based on the library 'CellularAutomaton' by John Hughes for R [58]. I am grateful to Fernando Soler-Toscano for letting me know about their work [65] on the complexity of 2D objects generated by elementary cellular automata. I would also like to thank David L. Dowe for his comments on a previous version of this paper. This work was supported by the MEC/MINECO projects CONSOLIDER-INGENIO CSD2007-00022 and TIN 2010-21062-C02-02, GVA project PROMETEO/2008/051, the COST - European Cooperation in the field of Scientific and Technical Research IC0801 AT, and the REFRAME project, granted by the European Coordinated Research on Long-term Challenges in Information and Communication Sciences & Technologies ERA-Net (CHIST-ERA), and funded by the Ministerio de Economia y Competitividad in Spain (PCIN-2013-037).José Hernández-Orallo (2015). On environment difficulty and discriminating power. Autonomous Agents and Multi-Agent Systems. 29(3):402-454. https://doi.org/10.1007/s10458-014-9257-1S402454293Anderson, J., Baltes, J., & Cheng, C. T. (2011). Robotics competitions as benchmarks for ai research. The Knowledge Engineering Review, 26(01), 11–17.Andre, D., & Russell, S. J. (2002). State abstraction for programmable reinforcement learning agents. In Proceedings of the National Conference on Artificial Intelligence (pp. 119–125). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.Antunes, L., Fortnow, L., van Melkebeek, D., & Vinodchandran, N. V. (2006). Computational depth: Concept and applications. Theoretical Computer Science, 354(3), 391–404. Foundations of Computation Theory (FCT 2003), 14th Symposium on Fundamentals of Computation Theory 2003.Arai, K., Kaminka, G. A., Frank, I., & Tanaka-Ishii, K. (2003). Performance competitions as research infrastructure: Large scale comparative studies of multi-agent teams. Autonomous Agents and Multi-Agent Systems, 7(1–2), 121–144.Ashcraft, M. H., Donley, R. D., Halas, M. A., & Vakali, M. (1992). Chapter 8 working memory, automaticity, and problem difficulty. In Jamie I.D. Campbell (Ed.), The nature and origins of mathematical skills, volume 91 of advances in psychology (pp. 301–329). North-Holland.Ay, N., Müller, M., & Szkola, A. (2010). Effective complexity and its relation to logical depth. IEEE Transactions on Information Theory, 56(9), 4593–4607.Barch, D. M., Braver, T. S., Nystrom, L. E., Forman, S. D., Noll, D. C., & Cohen, J. D. (1997). Dissociating working memory from task difficulty in human prefrontal cortex. Neuropsychologia, 35(10), 1373–1380.Bordini, R. H., Hübner, J. F., & Wooldridge, M. (2007). Programming multi-agent systems in AgentSpeak using Jason. London: Wiley. com.Boutilier, C., Reiter, R., Soutchanski, M., Thrun, S. et al. (2000). Decision-theoretic, high-level agent programming in the situation calculus. In Proceedings of the National Conference on Artificial Intelligence (pp. 355–362). Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999.Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 38(2), 156–172.Chaitin, G. J. (1977). Algorithmic information theory. IBM Journal of Research and Development, 21, 350–359.Chedid, F. B. (2010). Sophistication and logical depth revisited. In 2010 IEEE/ACS International Conference on Computer Systems and Applications (AICCSA) (pp. 1–4). IEEE.Cheeseman, P., Kanefsky, B. & Taylor, W. M. (1991). Where the really hard problems are. In Proceedings of IJCAI-1991 (pp. 331–337).Dastani, M. (2008). 2APL: A practical agent programming language. Autonomous Agents and Multi-agent Systems, 16(3), 214–248.Delahaye, J. P. & Zenil, H. (2011). Numerical evaluation of algorithmic complexity for short strings: A glance into the innermost structure of randomness. Applied Mathematics and Computation, 219(1), 63–77Dowe, D. L. (2008). Foreword re C. S. Wallace. Computer Journal, 51(5), 523–560. Christopher Stewart WALLACE (1933–2004) memorial special issue.Dowe, D. L., & Hernández-Orallo, J. (2012). IQ tests are not for machines, yet. Intelligence, 40(2), 77–81.Du, D. Z., & Ko, K. I. (2011). Theory of computational complexity (Vol. 58). London: Wiley-Interscience.Elo, A. E. (1978). The rating of chessplayers, past and present (Vol. 3). London: Batsford.Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. London: Lawrence Erlbaum.Fatès, N. & Chevrier, V. (2010). How important are updating schemes in multi-agent systems? an illustration on a multi-turmite model. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1-Volume 1 (pp. 533–540). International Foundation for Autonomous Agents and Multiagent Systems.Ferber, J. & Müller, J. P. (1996). Influences and reaction: A model of situated multiagent systems. In Proceedings of Second International Conference on Multi-Agent Systems (ICMAS-96) (pp. 72–79).Ferrando, P. J. (2009). Difficulty, discrimination, and information indices in the linear factor analysis model for continuous item responses. Applied Psychological Measurement, 33(1), 9–24.Ferrando, P. J. (2012). Assessing the discriminating power of item and test scores in the linear factor-analysis model. Psicológica, 33, 111–139.Gent, I. P., & Walsh, T. (1994). Easy problems are sometimes hard. Artificial Intelligence, 70(1), 335–345.Gershenson, C. & Fernandez, N. (2012). Complexity and information: Measuring emergence, self-organization, and homeostasis at multiple scales. Complexity, 18(2), 29–44.Gruner, S. (2010). Mobile agent systems and cellular automata. Autonomous Agents and Multi-agent Systems, 20(2), 198–233.Hardman, D. K., & Payne, S. J. (1995). Problem difficulty and response format in syllogistic reasoning. The Quarterly Journal of Experimental Psychology, 48(4), 945–975.He, J., Reeves, C., Witt, C., & Yao, X. (2007). A note on problem difficulty measures in black-box optimization: Classification, realizations and predictability. Evolutionary Computation, 15(4), 435–443.Hernández-Orallo, J. (2000). Beyond the turing test. Journal of Logic Language & Information, 9(4), 447–466.Hernández-Orallo, J. (2000). On the computational measurement of intelligence factors. In A. Meystel (Ed.), Performance metrics for intelligent systems workshop (pp. 1–8). Gaithersburg, MD: National Institute of Standards and Technology.Hernández-Orallo, J. (2000). Thesis: Computational measures of information gain and reinforcement in inference processes. AI Communications, 13(1), 49–50.Hernández-Orallo, J. (2010). A (hopefully) non-biased universal environment class for measuring intelligence of biological and artificial systems. In M. Hutter et al. (Ed.), 3rd International Conference on Artificial General Intelligence (pp. 182–183). Atlantis Press Extended report at http://users.dsic.upv.es/proy/anynt/unbiased.pdf .Hernández-Orallo, J., & Dowe, D. L. (2010). Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence, 174(18), 1508–1539.Hernández-Orallo, J., Dowe, D. L., España-Cubillo, S., Hernández-Lloreda, M. V., & Insa-Cabrera, J. (2011). On more realistic environment distributions for defining, evaluating and developing intelligence. In J. Schmidhuber, K. R. Thórisson, & M. Looks (Eds.), LNAI series on artificial general intelligence 2011 (Vol. 6830, pp. 82–91). Berlin: Springer.Hernández-Orallo, J., Dowe, D. L., & Hernández-Lloreda, M. V. (2014). Universal psychometrics: Measuring cognitive abilities in the machine kingdom. Cognitive Systems Research, 27, 50–74.Hernández-Orallo, J., Insa, J., Dowe, D. L. & Hibbard, B. (2012). Turing tests with turing machines. In A. Voronkov (Ed.), The Alan Turing Centenary Conference, Turing-100, Manchester, 2012, volume 10 of EPiC Series (pp. 140–156).Hernández-Orallo, J. & Minaya-Collado, N. (1998). A formal definition of intelligence based on an intensional variant of Kolmogorov complexity. In Proceedings of International Symposium of Engineering of Intelligent Systems (EIS’98) (pp. 146–163). ICSC Press.Hibbard, B. (2009). Bias and no free lunch in formal measures of intelligence. Journal of Artificial General Intelligence, 1(1), 54–61.Hoos, H. H. (1999). Sat-encodings, search space structure, and local search performance. In 1999 International Joint Conference on Artificial Intelligence (Vol. 16, pp. 296–303).Insa-Cabrera, J., Benacloch-Ayuso, J. L., & Hernández-Orallo, J. (2012). On measuring social intelligence: Experiments on competition and cooperation. In J. Bach, B. Goertzel, & M. Iklé (Eds.), AGI, volume 7716 of lecture notes in computer science (pp. 126–135). Berlin: Springer.Insa-Cabrera, J., Dowe, D. L., España-Cubillo, S., Hernández-Lloreda, M. V., & Hernández-Orallo, J. (2011). Comparing humans and AI agents. In J. Schmidhuber, K. R. Thórisson, & M. Looks (Eds.), LNAI series on artificial general intelligence 2011 (Vol. 6830, pp. 122–132). Berlin: Springer.Knuth, D. E. (1973). Sorting and searching, volume 3 of the art of computer programming. Reading, MA: Addison-Wesley.Kotovsky, K., & Simon, H. A. (1990). What makes some problems really hard: Explorations in the problem space of difficulty. Cognitive Psychology, 22(2), 143–183.Legg, S. (2008). Machine super intelligence. PhD thesis, Department of Informatics, University of Lugano, June 2008.Legg, S., & Hutter, M. (2007). Universal intelligence: A definition of machine intelligence. Minds and Machines, 17(4), 391–444.Leonetti, M. & Iocchi, L. (2010). Improving the performance of complex agent plans through reinforcement learning. In Proceedings of the 2010 International Conference on Autonomous Agents and Multiagent Systems (Vol. 1, pp. 723–730). International Foundation for Autonomous Agents and Multiagent Systems.Levin, L. A. (1973). Universal sequential search problems. Problems of Information Transmission, 9(3), 265–266.Levin, L. A. (1986). Average case complete problems. SIAM Journal on Computing, 15, 285.Li, M., & Vitányi, P. (2008). An introduction to Kolmogorov complexity and its applications (3rd ed.). Berlin: Springer.Low, C. K., Chen, T. Y., & Rónnquist, R. (1999). Automated test case generation for bdi agents. Autonomous Agents and Multi-agent Systems, 2(4), 311–332.Madden, M. G., & Howley, T. (2004). Transfer of experience between reinforcement learning environments with progressive difficulty. Artificial Intelligence Review, 21(3), 375–398.Mellenbergh, G. J. (1994). Generalized linear item response theory. Psychological Bulletin, 115(2), 300.Michel, F. (2004). Formalisme, outils et éléments méthodologiques pour la modélisation et la simulation multi-agents. PhD thesis, Université des sciences et techniques du Languedoc, Montpellier.Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81.Orponen, P., Ko, K. I., Schöning, U., & Watanabe, O. (1994). Instance complexity. Journal of the ACM (JACM), 41(1), 96–121.Simon, H. A., & Kotovsky, K. (1963). Human acquisition of concepts for sequential patterns. Psychological Review, 70(6), 534.Team, R., et al. (2013). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.Whiteson, S., Tanner, B., & White, A. (2010). The reinforcement learning competitions. The AI Magazine, 31(2), 81–94.Wiering, M., & van Otterlo, M. (Eds.). (2012). Reinforcement learning: State-of-the-art. Berlin: Springer.Wolfram, S. (2002). A new kind of science. Champaign, IL: Wolfram Media.Zatuchna, Z., & Bagnall, A. (2009). Learning mazes with aliasing states: An LCS algorithm with associative perception. Adaptive Behavior, 17(1), 28–57.Zenil, H. (2010). Compression-based investigation of the dynamical properties of cellular automata and other systems. Complex Systems, 19(1), 1–28.Zenil, H. (2011). Une approche expérimentale à la théorie algorithmique de la complexité. PhD thesis, Dissertation in fulfilment of the degree of Doctor in Computer Science, Université de Lille.Zenil, H., Soler-Toscano, F., Delahaye, J. P. & Gauvrit, N. (2012). Two-dimensional kolmogorov complexity and validation of the coding theorem method by compressibility. arXiv, preprint arXiv:1212.6745

    Use of NON-PARAMETRIC Item Response Theory to develop a shortened version of the Positive and Negative Syndrome Scale (PANSS)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Nonparametric item response theory (IRT) was used to examine (a) the performance of the 30 Positive and Negative Syndrome Scale (PANSS) items and their options ((levels of severity), (b) the effectiveness of various subscales to discriminate among differences in symptom severity, and (c) the development of an abbreviated PANSS (Mini-PANSS) based on IRT and a method to link scores to the original PANSS.</p> <p>Methods</p> <p>Baseline PANSS scores from 7,187 patients with Schizophrenia or Schizoaffective disorder who were enrolled between 1995 and 2005 in psychopharmacology trials were obtained. Option characteristic curves (OCCs) and Item Characteristic Curves (ICCs) were constructed to examine the probability of rating each of seven options within each of 30 PANSS items as a function of subscale severity, and summed-score linking was applied to items selected for the Mini-PANSS.</p> <p>Results</p> <p>The majority of items forming the Positive and Negative subscales (i.e. 19 items) performed very well and discriminate better along symptom severity compared to the General Psychopathology subscale. Six of the seven Positive Symptom items, six of the seven Negative Symptom items, and seven out of the 16 General Psychopathology items were retained for inclusion in the Mini-PANSS. Summed score linking and linear interpolation was able to produce a translation table for comparing total subscale scores of the Mini-PANSS to total subscale scores on the original PANSS. Results show scores on the subscales of the Mini-PANSS can be linked to scores on the original PANSS subscales, with very little bias.</p> <p>Conclusions</p> <p>The study demonstrated the utility of non-parametric IRT in examining the item properties of the PANSS and to allow selection of items for an abbreviated PANSS scale. The comparisons between the 30-item PANSS and the Mini-PANSS revealed that the shorter version is comparable to the 30-item PANSS, but when applying IRT, the Mini-PANSS is also a good indicator of illness severity.</p

    KIDMAP, a web based system for gathering patients' feedback on their doctors

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The gathering of feedback on doctors from patients after consultations is an important part of patient involvement and participation. This study first assesses the 23-item Patient Feedback Questionnaire (PFQ) designed by the Picker Institute, Europe, to determine whether these items form a single latent trait. Then, an Internet module with visual representation is developed to gather patient views about their doctors; this program then distributes the individualized results by email.</p> <p>Methods</p> <p>A total of 450 patients were randomly recruited from a 1300-bed-size medical center in Taiwan. The Rasch rating scale model was used to examine the data-fit. Differential item functioning (DIF) analysis was conducted to verify construct equivalence across the groups. An Internet module with visual representation was developed to provide doctors with the patient's online feedback.</p> <p>Results</p> <p>Twenty-one of the 23 items met the model's expectation, namely that they constitute a single construct. The test reliability was 0.94. DIF was found between ages and different kinds of disease, but not between genders and education levels. The visual approach of the KIDMAP module on the WWW seemed to be an effective approach to the assessment of patient feedback in a clinical setting.</p> <p>Conclusion</p> <p>The revised 21-item PFQ measures a single construct. Our work supports the hypothesis that the revised PFQ online version is both valid and reliable, and that the KIDMAP module is good at its designated task. Further research is needed to confirm data congruence for patients with chronic diseases.</p

    Differential Item Functioning on Antisocial Behavior Scale Items for Adolescents and Young Adults from Single-Parent and Two-Parent Families

    Get PDF
    We investigated measurement equivalence in two antisocial behavior scales (i.e., one scale for adolescents and a second scale for young adults) by examining differential item functioning (DIF) for respondents from single-parent (n = 109) and two-parent families (n = 447). Even though one item in the scale for adolescents and two items in the scale for young adults showed significant DIF, the two scales exhibited non-significant differential test functioning (DTF). Both uniform and nonuniform DIF were investigated and examples of each type were identified. Specifically, uniform DIF was exhibited in the adolescent scale whereas nonuniform DIF was shown in the young adult scale. Implications of DIF results for assessment of antisocial behavior, along with strengths and limitations of the study, are discussed
    • …
    corecore